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Abstract 

Molecular dynamics studies within a coarse-grained structure based model were used 
on two similar proteins belonging to the transcarbamylase family to probe the effects 
in the native structure of a knot. The first protein, N-acetylornithine transcarbamylase, 
contains no knot whereas human ormithine transcarbamylase contains a trefoil knot located 
deep within the sequence. In addition, we also analyzed a modified transferase with the 
knot removed by the appropriate change of a knot-making crossing of the protein chain. 
The studies of thermally- and mechanically-induced unfolding processes suggest a larger 
intrinsic stability of the protein with the knot. 

knots I proteins | force-induced stretching | molecular dynamics | AFM 
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1 Introduction 



After the first discovery of knotted proteins [T], considerable attention has been devoted 
to the identification of the types of knots that are present in the protein structure base [21 [3]. 
One interesting subclass identified contains more subtle topological configurations called 
slipknotted proteins [1]. While structure-based analysis are become increasingly available, 
there are few studies describing the dynamical properties of knotted proteins. Simulations 
of the folding of the small knotted protein lj85, combined with experimental results [SI [7] 
led Wallin et al. [5] to propose that non-native contact interactions are necessary to fold 
a protein into a topologically non-trivial conformation. Interestingly, in studies of the 
tightening of knots under stretching at constant velocity the knots were found to jump 
between a set of characteristic sites, typically endowed with a large curvature, before 
arriving at the final fully tightened conformation [8|. These results are in direct contrast 
to the well studied case of knots in homopolymers which tend to diffuse smoothly along 
the chain and then eventually slide off [Hj. 

It remains unclear whether knots are responsible for any biological functions or just oc- 
cur accidentally. One noteworthy suggestion posed is that they provide additional stability 
necessary for maintaining the global fold and function under harsh conditions [3]. Indeed, 
RNA methylotransferase derived from thermophilic bacteria appears to require knots for 
optimal function [10]. Consistent with the functional hypothesis, knots are usually found 
within catalytic domains of enzymes [3]. Sometimes they encompass active sites [3] where 
additional stability or rigidity could enhance catalysis when substrates are bound [2], [3]. 

Thus it is important to understand how the presence of a knot may infiuence the 
properties and behavior of proteins in solution. In this paper, we consider three proteins 
within the same superfamily which are almost identical and differ by the presence or 
absence of a topological knot. Two of the proteins are N-acetylornithine transcarbamylase 
AOTcase (the PDB code lyhl) and ormithine transcarbamylase OTCase (lc9y) where 
the former has a knot [2] and the latter does not contain this topological feature. The 
third structure is a synthetic construct made from lyhl by redirecting the backbone so 
that the knot is removed. This system will be referred to as lyhl*. We focus on thermal 
and mechanical unfolding processes in these systems and compare the properties of these 
proteins in silico within a structure-based coarse-grained model as implemented in [IH 
fT2l [T3] . In particular, we consider AFM-imposed stretching at constant velocity and at 
constant force and determine the characteristic times for the thermal unfolding and the 
folding temperature. In all cases, the knotted protein is more stable to unfolding. We 
compare these results with those observed for the sidechain disulphide bridged knots. 
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2 The proteins studied 



The proteins lyhl (discussed in and lc9y (discussed in [H]) belong to the tran- 
scarbamylase superfamily which is essential for arginine biosyntesis [I7]. The structures 
are nearly identical except that lyhl contains a knot in its native structure whereas lc9y 
does not [2]. The presence or absence of the knot seems to be responsible for the observed 
differences in enzymatic properties of the two proteins. 

Both proteins lyhl and lc9y comprise two main /3 domains denoted as a and b, linked 
by the two interdomain helices (Figure [I]). The "weaving pattern" in domain b is the 
structural feature that distinguishes the two proteins topologically. The a domain in lyhl 
incorporates (3 strands A(40-45), B(66-70), C(79-80), D(93-94), and E(108-112) whereas 
the b domain - strands G(172-177), 1(202-206), K(232-236), L(248-252), and M(290-292) 
which create two main /3 sheets. Both P sheets are surrounded by many a helices. Strands 
C and D are quite short, but they create an extended loop around site 80, denoted the 
80's loop, which is shorter in lc9y where strands C and D are missing altogether. 

The sequential positions at which the knot begins and terminates are denoted by rii and 
n2. These positions are determined by the KMT algorithm (see Materials and Methods). 
We use this algorithm at every step of our simulations, thereby obtaining the trajectories 
of knot's ends in the sequential space, such as those shown in the bottom panels of Figure 
[21 The trefoil knot structure present in lyhl extends between amino acids ni=172 and 
77-2=251 making it a relatively rare example of a "deep" knot since it is positioned relatively 
far from the termini of the protein. The knot encompasses almost the entire domain b, 
i.e. four /3 strands G, K, L, I and two nearby a helices which we denote by HI and H2 
(also present in lc9y). An important structural difference between lyhl and lc9y is the 
presence of the proline-rich loop (181-183) in the former, a main building block for the 
knot-making crossing of the protein chain. 

The two enzymes, OTCase and AOTCase participate, in the arginine biosynthetic 
pathway, however, the presence of the knot in AOTcase makes the corresponding pathway 
distinct [lH|. Both proteins contain two active sites - the first binds carbonyl phosphatase 
CP whereas the second site (which is modified by the knot structure) binds either N- 
acetylornithine or L-ornithine, in the case of lyhl and lc9y, respectively. The second 
site facilitates the chemical reaction with carbamyl phosphate to form acetylcitrulline or 
citrulline, correspondingly [151 [I6]. We use the notation for the active sites introduced in 
[15], [19], as shown in Figure [H The first active site, located between the two domains, is the 
same in the two proteins [15]. However, in lyhl the second active site is formed by Glul44 
(within the extended 80's loop), Lys252 (from 240's loop), and the proline rich loop (which 
creates the knot). On the other hand, in lc9y the second active site is localized near the 
240's loop [m [H]. Thus the proline rich loop in lyhl does not allow the formation of 
contacts between a ligand and the 240's loop (which is possible in lc9y) and leads to a 
different functional and topological motif. 
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The OTCase pathway shows ordered two substrate binding with large domain move- 
ments, whereas in the AOTCase pathway the two substrates are bound independently with 
small reordering of the 80's loop, small domain closure around the active site, and a small 
translocation of the 240's loop [IH|. Thus it seems that the knot plays two roles here: 
it changes the environment for the second substrate N-acetylcitrulline binding, and - as 
shown in this paper - makes the structure more stable. As a result, the functional and 
thermodynamic properties of the fold are affected by the presence of the knot. 

The proteins lyhl and lc9y have similar numbers of native contacts (as determined 
based on the van der Waals radii of heavy atoms [20j), 943 and 919 respectively, so any 
differences in properties must arise primarily from rearrangements in connectivities in the 
contact map . 

The folding, thermal and mechanical properties of these two proteins have not been 
compared up to now, mostly because the structure of AOTCase has not been known until 
recently and because the presence of the knot makes experimental data harder to interpret. 
However, some experimental work has been performed on them as detailed in Appendix. 

We have also analyzed a modified lyhl, in which the knot was removed by reversing the 
crossing created by the parts of the backbone contained between amino acids 175-185 and 
250-260. The cutting and pasting of these two parts of structure was done using all-atom 
techniques described in [2T], [22]. The resulting structure lyhl* has the same unknotted 
topology as lc9y while it has 14 fewer contacts than the original lyhl. This procedure 
affect the contact in the vicinity of the original knot-making crossings while leave the global 
contact map intact. The idea of rebuilding proteins to test their properties is a familiar 
one - another interesting example of such protein engineering was discussed recently in 



3 Resistance to mechanical stretching 

One way to probe the stability of a biomolecule is to perform mechanical manipulations 
on it, such as stretching. The corresponding experimental data on the two proteins are 
not yet available, thus we have resorted to computer modeling. We consider the case in 
which the termini are connected to elastic springs. The N-terminal spring is anchored 
to a substrate and the C-terminal spring is pulled either at a constant velocity, Vp, or at 
constant force. 

3.1 Stretching at constant velocity 

In this mode of manipulation, one monitors the force of resistance to pulling, F, as a 
function of the pulling spring displacement, d. We usually take fp=0.005 A /r which is 
about 100 times faster than typical experimental speeds. Results obtained for t>p=0.001 
A /r are found to be similar. In the absence of thermal fiuctuations a single unfolding 
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trajectory is followed. At finite temperatures, however, differences between various trajec- 
tories arise. Usually, these differences are small. Such is the case for the unknotted lc9y 
for which a typical F{d) trajectory is shown in the rightmost panel of Figure [2l However, 
for lyhl we identify two distinct pathways. The major pathway is shown in the middle 
panel of Figure [2] and the alternative pathway in the leftmost panel. In fact, that pathway 
is quite rare: it has been found just once in fifty trajectories. The locations of the knot 
ends during stretching are displayed in the lower regions of the two panels. The immediate 
conclusion is that the knotted protein lyhl is typically more resistant to stretching than 
lc9y since the maximum force peak, F^ax, is about 3.3 compared to 2.6 e/A (2.9 and 1.7 
e/A for Vp = 0.001 A/r), with the energy scale e as defined in the Materials and Methods. 
It is only for the rare trajectory that the values of F^ax for the two proteins are nearly the 
same, but even then the unfolding pathways are distinct as evidenced in Table I. Based on 
the data presented in ref. [24l[T4], the unit of force, e/A, used here should be of order of 70 
pN. There are uncertainties in this estimate (of order 30pN), but the important observation 
is that we compare similar proteins with a similar effective value of the e. 

Table I shows that the unravelling of both proteins proceeds along different pathways. 
Unfolding of the unknotted lc9y starts from domain b (which is stabilized by the knot in 
lyhl) and once this domain is fully unravelled the unwinding of domain a follows. In the 
knotted lyhl also the domain b begins to unfold first. However, in the typical pathway, 
its unfolding stops relatively soon, just after the strands L and M are pulled apart since 
the next step would disarrange the knot. Instead, the domain a is unfolded first, and only 
then the process of knot tightening begins. 

We note that the first broad peak for each trajectory from Figure [2] corresponds to the 
shearing motion between two domains, which are connected by two alpha helices. It has 
been established experimentally |l8j that the interdomain interactions in lyhl are slightly 
stronger than in lc9y and are mainly hydrophobic, which is consistent with our observation 
that the first peak in lyhl is higher than in lc9y. Also the origin of the main force peak is 
different in the two proteins: in lyhl (typical pathway) it coincides with knot tightening 
within domain 6, which is accompanied by shearing of the P strands G+I, G+L, I+K. 
In contrast, in lc9y the main peak is associated with shearing the P strands A+B, A+E 
within domain a. On the other hand, the rare unfolding pathway of lyhl shares many 
features with that of lc9y. Nonetheless, due to the presence of the knot, pulling the strands 
in domain b apart involves a higher force than in lc9y (where the 6-domain related peaks 
appear at distances 400-700 A). 

We now consider constant speed stretching of the synthetic protein lyhl*. Two alter- 
native stretching pathways are also observed in this case, as shown in Figure [3l The typical 
pathway (8 out of 10 trajectories) yields Fmax of just below 2.5 e/A which is smaller than 
Fmax for the typical pathway in lyhl by ~ 0.5 e/A. The minor pathway yields Fmax which 
is smaller by ~ 0.2 e/A than the corresponding value in lyhl. This lowering in the value 
of Fmax clearly points to the dynamical significance of the knot. In the typical case, the 
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unfolding process is found to proceed in the same way as in the unknotted lc9y: domain 
b unfolds first, followed by a. On the other hand, in the alternative trajectory, domain b 
first unfolds partially, then complete unfolding of a follows, and only then unravelling of b 
is completed. This pathway is analogous to the typical unfolding of the original knotted 
lyhl. However, it is the unfolding of domain a (and not b) which is responsible for the 
main force peak in lyhl*. The corresponding value of F^ax — "2.4:6/ k is close to the Fmax 
observed for the unknotted lc9y (where it also arises from unfolding of domain a). All 
of these observations indicate that the dynamical differences between lyhl and lc9y can 
indeed be attributed to the presence or absence of a knot in the former. 

We now discuss the process of knot tightening and focus on the knotted lyhl. Similar 
to what has been found in other proteins with knots [8j, the knot ends in lyhl make sudden 
jumps to selected metastable positions. Figure [2] shows that those jumps are correlated 
with the force peaks corresponding to unfolding events in domain b. In the typical case 
(the left panel), the knot moves to one of the metastable places at d ~ 1000 A (where 
F becomes Fj^ax), which is followed by tightening of the knot, usually in two additional 
steps. As shown in [8] the set of possible sites at which an end may land corresponds to 
the sharp turns in the backbone (usually with proline or glycine). In our case, the sites 
Gly-200, Pro-210 and Gly-230 are found to be the most likely choices. It is interesting to 
note that for the rare pathway in the set of possible sites at which an end lyhl (middle 
bottom panel in Figure [2]), the knot first moves from the native position (172,251) to (Val- 
140, Gln-151). The new knot end positions are close to Pro-139 and Pro-149 which makes 
this location stable. In proteins comprising less than 151 amino acids, Fmax tends to arise 
at the beginning of the stretching process [I3]. Here, however, the proteins are large and 
adjust to pulling by first rotating to facilitate unfolding of other parts in their structure, 
and only then by unraveling the harder knotted part. 

We also analyzed stretching of tandem linkages of the proteins. Two proteins lc9y 
linked together are found to unravel in a serial fashion. This is not the case, however, for 
two domains of lyhl. When the unfolding process in one domain reaches the knot region, 
the other domain starts to unfold. In the final stages both knots tighten simultaneously. 

3.2 Comparison between the effects of knots and of disulphide 
bridges 

In the current study we demonstrate that knots provide extra mechanical stability to 
proteins. Thus, one may think of knots as acting analogously to disulphide bridges between 
cysteins. Like knots (with the exception of a situation in which pulling unmakes the knot), 
the disulphide bridges cannot be removed from proteins by stretching. However, unlike 
knots, they cannot slide along the sequence. Furthermore, the bridges can be weakened 
through application of the reducing agent DTT as in refs. [25], [26]. As a theoretical 
analogue of the cysteine knot-containing hormons studied by Vitt et al. [27], we consider a 
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hypothetical mutated version lc9y in which amino acids at sites 195 and 265 (one could also 
consider 194 and 262) are replaced by cysteins. The resulting disulphide bridge linking the 
two sites would close a knot-like loop. The presence of a disulphide bridge can be imitated 
by strengthening the amplitude of the Lennard- Jones contact potential to ess = C^- We 
consider ^ = 20 which makes the bridge essentially indestructible. 

The rightmost panel of Figure [3] shows that the resulting F{d) pattern is quite similar 
to the typical trajectory for lyhl shown in the left panel except for a diverging force peak 
towards the end of the process. One can endow the disulphide bond with more pliancy 
by reducing ( to the value of 10 and thus allowing for the continuation of the stretching 
process (the dotted line in Figure [3|). The corresponding sequence of the rupture events 
(L+M, followed by A+B, then A+E, then E+F, then G+L, G+I, HI, H2, and finally I+K) 
is different than any of lyhl unfolding pathways ( Table I). However, the order of events 
seems closest to the typical trajectory found for lyhl: partial unwinding of domain b, 
followed by unwinding of a and then returning to unravel b. We conclude that even though 
the disulphide bridges act dynamically similar to the knots there are also differences in the 
details. 

3.3 Stretching at constant force 

The dynamical differences between the knotted and unknotted proteins should also be 
visible when performing stretching at a constant force, F. In this mode of manipulation, 
one monitors the end-to-end distance, L, as a function of time as illustrated in Figure [4] for 
selected trajectories. In each trajectory, L varies in steps indicating transitions between a 
set of metastable states that depend on the applied force. For F < 1.7 (where F denotes F 
in units of e/A), domain b in lc9y gets unraveled first while domain a remains intact. Once 
the system reaches L which is just above 900 A, it stays at this extension indefinitely. For 
larger forces, the b domain also unravels and the ultimate value of L reached is ~1200 A. 
The pathways observed for the knotted protein lyhl the are rather different. For F < 1.7 
neither domain a nor b unfolds indicating again the stabilizing role of the knot. It is only 
the remaining parts of the structure that unravel leading to the largest L of 600 A. For 
F between 1.7 and 1.9, two pathways are possible. In the first one, domain a remains 
nearly intact while domain b gets unfolded, leading to tightening of the knot and to a 
maximum value of L of 950 A. This situation is analogous to the one found for lc9y. In 
another pathway, the a domain unfolds first, but again full extension of the chain is not 
achieved. For F > 1.9 the b domain always is always the first to unfold. The related 
movement of knot's ends are shown in Figure O The knot tightening process looks similar 
to the one observed in the rare trajectory for the constant velocity stretching (Figure O 
middle panel). In this case, domain a eventually unfolds, leading to full extension of the 
chain. For F > 1.9, the scenarios of unfolding for lyhl and lc9y are almost identical 
(except for the breakage of C+D bonds, which are absent in lc9y) and are summarized 
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in Table I. However the time intervals between consecutive steps are typically longer for 
lyhl, indicating a slower unfolding process. An analysis of the results of stretching with 
constant velocity lead us to expect an interesting behavior for the results for F ^ Fc=1.7 
e/A, as the values of Fmax (corresponding to domain h) for lc9y seen in Figure [2] are much 
lower than Fc, while for lyhl some of them are above (both in the typical and rare 
trajectories). The characteristic value F^ is indicated in Figure [2] by the horizontal dotted 
line. Indeed, for stretching with a force > F^, we do not observe any steps in the curves 
L{t) that are related to peaks 1-4 for lc9y (Figure El the right panel). On the other hand, 
we still observe such structures (corresponding to the highest among peaks 1-5 in Figure 
[2], the left and middle panels) during stretching of lyhl with F = F^ (and slightly higher). 
Such a behavior is seen in Figure Hfor F =1.9 e/A. 

We also analyzed in detail an example of a constant force pathway for lyhl for F > 1.9 
e/A (see Figure 5 and Suppl. Mat.) The scenarios of events reported here are consistent 
with the prominent role of the sharp structural turns in the dynamics of knot's ends [8]. In 
particular, when the knot is tightened, its right end moves across several pinning centers 
comprising the turn between /3-strand K and the small helix H2, and the sharp turn at 
Pro-210, slowing down at successive pinning centers. Each slowing down manifests itself 
as accumulation of points along the tilted interval in Figure 5. 

So far we have discussed the differences between lc9y and lyhl as seen at the level of 
single stretching trajectories. These differences are also visible after averaging over many 
trajectories, as demonstrated in Figured In particular, we find that for F = 2.0 (shown 
in the bottom panel), which allows for the full extension in both proteins, lyhl takes 
longer to unfold than lc9y. However, for forces higher than 2.3 e/A the differences in the 
averaged trajectories are minor. The top panel shows that in order to match the time scale 
of unfolding lc9y at F = 1.9 in lyhl one has to enhance the value of F to 2.2. 

One can quantify the time scales of the force induced unfolding by determining the 
mean time, tunf, needed to break all contacts with a sequential distance \ j — i\ bigger than 
a threshold value Ic (a somewhat different criterion has been used in ref. |28]), see also a 
related study by Socci et al. [29]. The smaller the /c, the longer the corresponding 
In practice, we have found it feasible to take Ic = 8. As shown in Fig. [7] the resulting 
unfolding times, t^nfiF), are longer for lyhl than for lc9y, which is another manifestation 
of the higher stability of the knotted protein. The stability of lyhl is significantly reduced 
upon replacing lyhl by its synthetic variant lyhl*. Figure 7 also indicates the values of 
F* - a force above which the unfolding commences instantaneously. Again, F* for lyhl is 
substantially higher than for lc9y and lyhl*. 

4 Thermal stability 

We now consider unfolding via thermal fluctuations following the approach of Ref. [30] . 
We define the unfolding time, tu, as the median duration of a trajectory that starts in the 
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native state and stops when all contacts within |j — i| > Ic get broken. For consistency 
with the mechanical studies, we choose Ic = 8. The temperature dependence of tu for both 
proteins is shown in Figure M Clearly, for any given T, it takes substantially longer to 
unravel lyhl than either lc9y or lyhl*. For instance, at ksT / e=l.?> the ratio of t„ between 
lyhl and lc9y is about 2. 

It should be noted that the mere fact that the contacts with the sequential length 
larger than Ic are broken does not necessarily mean that the knot itself has loosened and 
become untied. In fact, according to our studies of thermal unfolding, the knotted proteins 
unfold in two steps: first the long-ranged contacts break and only then, at much longer 
time scales, the knot becomes undone. Thus the unfolding follows the N ^ UK U 
path, where stands for the native state, UK for the unfolded knotted state and U for 
the totally unfolded, unknotted state. Due to the topological constraints present in the 
UK state, its entropy is considerably lower than that in U state, thus the free energy 
difference between UK and N is much higher than that between U and A^, which leads to 
the increased stability of the native state. Similar entropy-based strategies for increased 
stabilization are found in other topologically constrained proteins [HI], e.g. in proteins with 
circular backbones, which has been shown to be highly resistant to enzymatic, thermal and 
chemical degradation [32]. There is also another, energy-based reason for the increased 
stability of lyhl and, possibly, of other knotted proteins. Namely, non-trivial topology 
of a protein may lead to a more energetically favored conformational state. This is the 
case for the three proteins considered here: the knotted lyhl has the lowest native state 
energy. The native state energy of lyhl*, the unknotted counterpart of lyhl, exceeds that 
of lyhl by 14e, whereas that of lc9y is higher than lyhl by about 24e. Thus one of the 
reasons why knots may be preferred in certain proteins is that they lead to deep native 
state minima. 

Apart from the higher stability of lyhl, its longer unfolding times can also be explained 
in terms of topological frustration [23l[33j. It arises when only a particular order of contact 
breaking allows the protein to unfold. When this order is incorrect, certain geometrical 
constraints arise which do not allow for unfolding, and some contacts are forced to form 
back again. Therefore a protein unfolds in a series of steps, also called a backtracking, which 
involve refolding and unfolding. The consequence of this geometric bias is an unusually long 
unfolding time. There are obvious geometrical constraints present in lyhl related to its 
knotted structure, so it is likely that its unfolding is dominated by topological frustration 
and takes more time than unfolding of unknotted lc9y or lyhl*. A particular example of 
backtracking, which arises in lyhl is presented in detail in Appendix. 

To assess the magnitude of fiuctuations around the native state we measured Pq{T) 
defined as the fraction of time during which all native contacts are established for the 
trajectory starting in the native conformation. This quantity can be regarded as yet 
another measure of stability. However, even though Pq is calculated based on relatively 
long trajectories of lO^r, this is still only a small fraction of the expected unfolding time 
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in this range of temperatures. These trajectories are therefore not ergodic and probe 
vicinity of the native state basin. The results are shown in the inset of Figure [8l the 
left panel shows the data for entire length proteins whereas the middle and right panels 
are for the a and b domains, respectively. In the right inset panel for domains b (which 
contains the knot in lyhl) the data points corresponding to lc9y are shifted towards lower 
temperatures relative to lyhl. A similar, however smaller shift towards lower temperatures 
is also observed for the synthetic lyhl*. On the other hand, data points for domain a (the 
middle panel) and for the whole protein (left panel) are similar. Thus differences in Pq are 
confined to domain b and indicates a higher stability of domain b in the knotted protein. 

4.1 Thermal untying of a knot in lyhl protein 

As mentioned above, untying of the knot involves much longer time scales than those 
of long range contact breaking. However, the unknotting times decrease with increasing 
temperature. Meaningful studies could be performed for kBT/e=1.2 (and higher). We 
have found that the knot opens more readily on the side closer to the C terminus, while 
its N-terminus-side is more stable. This is in agreement with the results of [34] on the 
asymmetry of (slip)knots, and the fact that they arise much more often closer to the N- 
terminus. Examples of conformations corresponding to different ways of thermal untying 
of the knot are shown in Figure [S For each terminus, there are two possibilities: either 
it is the last site to leave the knot or else it is a leader that pulls the rest of the knotted 
loop behind it. The latter circumstance is known as a formation of a slipknot [4]. It is 
interesting to note that application of a high temperature has been occasionally found to 
generate short lived additional (slip)knots, especially when the native knot has disappeared. 

As generally expected and demonstrated in ref. [30] explicitly, the process of thermal 
unfolding is statistically reverse to folding. Thus the phenomena we observe for unfolding 
should also be observed in folding processes. This also suggests that the presence of 
the non-native attractive contacts is not necessary for formation of a knot. Indeed, in a 
subsequent paper we show that proteins of nontrivial topology have the ability to fold to 
their native states without any non-native interactions involed. Such non-native contacts 
have been vital in folding simulations of Wallin et al. [5]. More details and particular 
examples concerning thermal untying and backtracking it may be accompanied by are 
presented in Appendix. 

5 Discussion and conclusions 

We have considered three very similar proteins - one with a knot and two without - 
and determined their properties by using a coarse-grained native-geometry based model. 
Both mechanically and thermally, the protein with the knot has been found to be more 
robust and is characterized by longer unfolding times, which we attribute to topological 
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and geometric frustration. The larger robustness of lyhl relative to lc9y relates to to the 
experimental results on OTCase and AOTCase pathways. The OTCase pathway shows 
the two-substrate binding involving large domain movements. In this pathway, the order 
in which the substrates are bound is well defined. On the other hand in the AOTCase 
pathway, the two substrates are bound independently. This process involves small reorder- 
ing of the 80's loop, small domain closure around the active site, and a small translocation 
of the 240's loop [l8|. 

Other findings can be summarized as follows: The unknotted variant of lyhl has been 
found to behave like the unknotted lc9y. Therefore we conclude that this is the nontrivial 
knot topology that is responsible for the peculiar properties of lyhl. Bisulphide bridges 
may imitate existence of knots to some degree. The kinetics of the knot untying and thus, 
by a reversal, the kinetics of formation of the knot may involve generation of other knots 
and slipknots. According to [I5], the presence of the knot motif in AOTCase affects the 
way the N-acetylcitrulline is bound to the second acive site and thus changes the arginine 
biosynthetic pathway. This observation can provide important information on potential 
targets for specific inhibition of bacterial pathogens. Such inhibitors would not affect the 
more common OTCase and thus provide a specific non-toxic method for controlling certain 
pathogens. 

Taken together, these findings show that relatively small structural differences between 
the proteins which, however, alter the topology of the backbone, result in dramatic changes 
in their mechanical properties and stability. This research reveals that there is a strong 
relationships between the topological properties and functional features of biomolecules. 

6 Materials 

6.1 Coarse-grained model 

The coarse-grained molecular dynamics modeling we use is described in detail in refs. 
[TTl [T2I [13]. In particular, the native contacts between the C" atoms in amino acids i 
and j, separated by the distance rj^ , are described by the Lennard- Jones potential Vu = 
4e [{cTij / TijY'^ — {cTij / rij)^] . The length parameter cxjj is determined pair-by-pair so that 
the minimum in the potential corresponds to the native distance. The energy parameter 
e is taken to be uniform. As discussed in ref. [24], other choices for the energy scale and 
the form of the potential are either comparable or worse when tested against experimental 
data on stretching. Folding is usually optimal at temperature ksT/e around 0.3 {ks is the 
Boltzmann constant) which will be assumed as playing the role of an approximate room 
temperature. Implicit solvent features come through the velocity dependent damping and 
Langevin thermal fiuctuation in the force. We consider the overdamped situation which 
makes the characteristic time scale, r, to be controlled by diffusion and not by ballistic- 
motion, making it of order of a ns instead of a ps. The analysis of the knot-related 
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characteristics is made along the hnes described in ref. [8]. 

6.2 KMT algorithm 

We determine the sequential extension of a knot, i.e. the minimal segment of amino 
acids that can be identified as a knot, by using KMT algorithm [3^. It involves removing 
the C" atoms, one at a time, as long as the backbone does not intersect a triangle set by 
the atom under consideration and its two immediate sequential neighbors. 

Appendices 

A The two proteins 

In the classical arginine biosynthetic pathway, the OTCase enzyme first deacetylates 
N-acetylornithine to the L-orinithine and then forms citrulline through carbamylation. In 
the other pathway involving AOTCase, the process is reversed: N-acetylorintine is first 
carbamylated to the N-acetylocitruline and then deactylated to the cirruline. 

The folding, thermal, and mechanical properties of these two proteins have not been 
compared up to now, mostly because the structure of AOTCase has not been known until 
recently and because the presence of the knot makes experiments harder to interpret. 
However, it has been determined that both substrates N-acetylcitrulline (lyhl) and L- 
norvaline (lc9y) obey Michaelis-Menten kinetics [ITj. It has also been found [l7] that 
affinity of the human OTCase to ornithine is 10 times greater than the affinity of AOTCase 
to N-acetyl-ornithine, while the affinity for carbamyl phosphate is approximately five times 
smaller. The thermal stability was measured only for the OTCase lc9y (by measuring the 
temperature at which 50% of the enzyme activity is lost) was determined to be 56±1 ''C 

m- 

It has to be noted that there exists another member of the transcarbamoylase family, 
SOTCase (extracted from B. Fragilis argF', Isjl), which also contains a knot. The rmsd 
between two structures, based on 280 equivalent Ca positions, is around 1.4 A [15]. There 
are also structures which are similar to the human OTCase lc9y like E. coli ATCase (PDB 
code lekx) with RMSD 1.7 A based on 262 equivalent Ca atoms [I5]. The superpositions 
of these four proteins with their substrates have been carried out in [H], where only slight 
differences between corresponding pairs of the knotted and unknotted proteins were found. 
We have also checked that the properties of Ijsl and lekx in the model are nearly identical 
as those of lyhl and lc9y, respectively. For this reason, our analysis is focused on the 
lyhl and lc9y proteins. 
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B Typical trajectory of a knot in lyhl under stretching 
by a constant force 

We now analyze an example of the constant force pathway for lyhl for F > 1.9 e/A in 
more detail (c/. Figure 5 in the main paper and Fig. [TO] below). The right end of the knot 
is initially located close to /3-strand L (248-252). The left end of the knot is located at 
the sharp turn involving glicyne (170), and it is rather hard to force this end to leave this 
turn. For this reason, when a constant force is applied, the right end of the knot starts to 
move: as the protein backbone is being pulled out of the loop, the sequential location of 
this end decreases. Interestingly, the decrease is linear in time (the center part in Figure 5) 
and it involves motion of the right end across several pinning centers comprising the turn 
at His-237, located between /3-strand K (232-236), the small helix H2 (238-244), and the 
sharp turn at Pro-210. These centers are seen as accumulations of points along the tilted 
interval in Figure 5. Nonetheless, these pinning sites are weaker than the force that holds 
the left end at site 170 and hence it is only the right end that can slide. The situation 
changes when the right end reaches Gly-200 within a sharp turn at the end of a helix HI. 
Interestingly, the left end is now "pushed" out of site 170 by the right end and it starts 
to move to the left, expanding the knot region a bit and resulting in a translation of the 
whole knot to the left. This translational motion stops on the left at Gly-164 at the end of 
the long helix (147-164). At the same time the right end passes through a half-loop (with 
a sharp and rigid turn involving prolines at 181 and 183) between /3-strand G (172-177) 
and helix HI (185-199). Finally, the right end stops at site 175 and the knot becomes 
fully tightened. This scenario of events is consistent with the observation of the role of the 
sharp structural turns in the dynamics of knot's ends made in ref. [8]. 

We enclose a video presentations of the stretching of the two proteins, as generated 
using our implementation of the Go-like model. The first animation presents the protein 
lc9y, and the second - protein lyhl. The process of tightening of the knot in the animation 
corresponds to the Figure 5 of the main text. The knotted region in lyhl and corresponding 
region in lc9y are marked in green. 

C Threshold force F* 

We define F* as the threshold force at which the free energy barrier for the transition 
from the native to the unfolded state vanishes and the protein begins to unfold in a downhill 
manner. Unfolding is then essentially immediate, without any intermediate states ( see the 
inset in Figure 7 in the main text). The force F* is analogous to that found in simulations 
of ubiquitin [28] above which the unfolding times are short and distributed log-normally 
and below which they are substantially longer and distributed exponentially. For forces 
below F*, the median unfolding times follow a trend, which in general is a superposition 
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of exponential functions [28]. For forces above F*, unfolding times also decrease with an 
increasing force, but at a much slower rate. 

For lyhl and lc9y we find F* of 3.2 and 2.5 e/A respectively, as indicated in figure 7 in 
the main text by the arrows. The data shown in this figure are based on 300 trajectories 
for F < 1.9e/ and 100 trajectories for F > 1.9e/. The relative shift in the location of F* 
is notable: F* for the knotted lyhl is higher pointing to a higher stability. 

D Thermal untying of the knot in the lyhl protein. 

Fig. 9 in the main text shows the knot untying process in a schematic way. Fig. [TT] 
above shows the corresponding conformations in more detail, with the original position of 
the knot along the backbone marked. 

E Backtracking 

The process which involves a series of breaking and forming of the same group of con- 
tacts due to topological barrier is called backtracking [33], [23]. Complete thermal unfolding 
of the knotted proteins (i.e. unfolding to the trivial topology, with the knot untightened) 
would not be possible without such backtracking. An example of a backtracking due to 
knot topology is untying of the protein lyhl from the N terminal. In this case the knot 
has to move along almost entire chain. The translocation of the knot across the backbone 
is correlated with refolding due to backtracking of a part of the structure, as seen in Fig. 
[T2I The bottom panel in that figure shows the number of contacts Q in domain b during 
unfolding. The top panel shows the number of contacts Q inside the domain a. In the 
native state the position of the knot is stabilized by contacts G+I and I+K (in domain b). 
These contacts periodically break (black line), however until 2800r the knot is localized in 
domain b, while in domain a all contacts keep breaking randomly. When the knot moves 
to domain a at 2800r, the periodic refolding of contacts A+E is observed (top panel). 
Eventually, the knot slides off the chain through the terminus N. 
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Table 1: The order of the contact breaking for different pathways 




Figure 1: Left: the cartoon representation of the unknotted lc9y (top) and knotted lyhl 
(bottom) proteins. Both consists of two /3-domains, denoted as a and b. Right: domain b is 
topologically trivial in lc9y (top), while knotted in lyhl (bottom). The arrows indicating 
the active sites are arranged in such a way that the upper (lower) arrow corresponds to 
the first (second) active site. The knot in the native state in lyhl extends between amino 
acids 172 and 251 (whose locations are denoted in a schematic figure on the right). 
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Figure 2: Top: Unfolding curves of force versus protein length F{d) stretched at constant 
velocity v = 0.005 Ar as explained in the text. The horizontal dotted line indicates a 
reference of F=1.7 e/A corresponding to the hight of many of the force peaks. It is drawn 
to facilitate panel-to-panel comparisons. The initial force peaks do not relate to the a and 
b domains. The remaining force peaks are labeled 1 through 7 except that in the middle 
panel there is an extra peak between 4 and 5 corresponding to shearing of helices that are 
coupled to the a domain In each case, the force peak labeled as 1 arises due to shearing of 
the L strand against the M strand. Table I lists which contacts break (i.e. rij > l.Baij) at 
the remaining the remaining peaks. Bottom: Sequential movement of knot's ends during 
the knot tightening process corresponding to the trajectory shown above. 
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Figure 3: F{d) curves for the synthetic lyhl* without the knot (left and center panel), 
and for the mutated lc9y with the disulphide bridge (right panel). In the latter, the solid 
line corresponds to 20 and the dotted line to C=10. 
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Figure 4: The time dependence of the end-to-end distance when stretching by constant 
force for the indicated values of the force. The left panels refer refer to the unknotted 
protein and the right panels to the knotted one. Schematic pictures of the conformations 
corresponding to the metastable state are displayed on the right hand side of each panel 
where the a and b domains are depicted as blobs. 
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Figure 5: A typical trajectory of knot's end locations for stretching at constant force. 
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Figure 6: The average end-to-end distance as a function of time for the forces indicated. 
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Figure 7: The unfolding times tunf as a function of the force applied. The solid fat line 
(with squares) and solid fine line (with asterisks) are respectively for lyhl and lyhl*, the 
dotted line (with circles) for lc9y. Inset: for F=3.2 the protein is stretched instantaneously, 
without formation of any metastable states, and with small trajectory-to-trajectory vari- 
ations. 




Figure 8: The dependence of the median unfolding time on temperature. The solid fat line 
(with squares) and solid fine line (with asterisks) are respectively for lyhl and lyhl*, the 
dotted line (with circles) for lc9y. Inset: The temperature dependence of the probability 
of preserving all the native contacts in lyhl and lc9y. 
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Figure 9: Three possible ways of thermal untying of the knot. From the left to the right: 
simple from the C terminus, simple from the N terminus and through formation of a 
slipknot. 




sequential position 

Figure 10: Left, a typical trajectory of knot's ends locations for stretching at constant force. 
Right: the corresponding region of the knot in lyhl shown in the cartoon representation. 
The pinning centers are indicated in the both panels. 
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Figure 11: Three possible ways of thermal untying of the knot. From the left to the 
right: simple from the C terminus, simple from the N terminus and through formation of a 
slipknot. The cyan (medium thick) line indicates the native location of the knot, whereas 
the green (thick) line in the center and left panel shows the instantaneous position of the 
knot. In the right panel, the position of the slipknot is indicated by the combined lines of 
medium and large thickness. 




Figure 12: Thermal untying of the protein accompanied by backtracking. The bottom 
and top panels show respectively the number of contacts Q in domains h and a during 
unfolding. The black line approximates periodic breaking of contacts G+I and I+K in the 
first phase of unfolding, when the knot is still localized in domain h. The knot moves from 
domain 6 to a around 2800r, and eventually slides off the chain through the terminus N. 
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